Search CORE

44 research outputs found

A Database and Evaluation for Classification of RNA Molecules Using Graph Methods

Author: A Rybarczyk
B Shabash
G Chojnowski
J Huang
L Chen
M Antczak
N Shervashidze
P Klosterman
RC Wilson
RPW Duin
SB Needleman
SS Ray
Z Miao
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 13/06/2019
Field of study

In this paper, we introduce a new graph dataset based on the representation of RNA. The RNA dataset includes 3178 RNA chains which are labelled in 8 classes according to their reported biological functions. The goal of this database is to provide a platform for investigating the classication of RNA using graph-based methods. The molecules are represented by graphs representing the sequence and base-pairs of the RNA, with a number of labelling schemes using base labels and local shape. We report the results of a number of state-of-the-art graph based methods on this dataset as a baseline comparison and investigate how these methods can be used to categorise RNA molecules on their type and functions. The methods applied are Weisfeiler Lehman and optimal assignment kernels, shortest paths kernel and the all paths and cycle methods. We also compare to the standard Needleman-Wunsch algorithm used in bioinformatics for DNA and RNA comparison, and demonstrate the superiority of graph kernels even on a string representation. The highest classication rate is obtained by the WL-OA algorithm using base labels and base-pair connections

Crossref

White Rose Research Online

Identifying the Machine Learning Family from Black-Box Models

Author: B Biggio
C Ferri
CS Wallace
D Angluin
DH Wolpert
G Giacinto
GM Benedek
JR Landis
LG Valiant
LI Kuncheva
M Fernández-Delgado
MP Sesmero
MR Smith
P Domingos
R Blanco-Vega
RPW Duin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/09/2018
Field of study

[EN] We address the novel question of determining which kind of machine learning model is behind the predictions when we interact with a black-box model. This may allow us to identify families of techniques whose models exhibit similar vulnerabilities and strengths. In our method, we first consider how an adversary can systematically query a given black-box model (oracle) to label an artificially-generated dataset. This labelled dataset is then used for training different surrogate models (each one trying to imitate the oracle¿s behaviour). The method has two different approaches. First, we assume that the family of the surrogate model that achieves the maximum Kappa metric against the oracle labels corresponds to the family of the oracle model. The other approach, based on machine learning, consists in learning a meta-model that is able to predict the model family of a new black-box model. We compare these two approaches experimentally, giving us insight about how explanatory and predictable our concept of family is.This material is based upon work supported by the Air Force Office of Scientific Research under award number FA9550-17-1-0287, the EU (FEDER), and the Spanish MINECO under grant TIN 2015-69175-C4-1-R, the Generalitat Valenciana PROMETEOII/2015/013. F. Martinez-Plumed was also supported by INCIBE under grant INCIBEI-2015-27345 (Ayudas para la excelencia de los equipos de investigacion avanzada en ciberseguridad). J. H-Orallo also received a Salvador de Madariaga grant (PRX17/00467) from the Spanish MECD for a research stay at the CFI, Cambridge, and a BEST grant (BEST/2017/045) from the GVA for another research stay at the CFI.Fabra-Boluda, R.; Ferri Ramírez, C.; Hernández-Orallo, J.; Martínez-Plumed, F.; Ramírez Quintana, MJ. (2018). Identifying the Machine Learning Family from Black-Box Models. Lecture Notes in Computer Science. 11160:55-65. https://doi.org/10.1007/978-3-030-00374-6_6S556511160Angluin, D.: Queries and concept learning. Mach. Learn. 2(4), 319–342 (1988)Benedek, G.M., Itai, A.: Learnability with respect to fixed distributions. Theor. Comput. Sci. 86(2), 377–389 (1991)Biggio, B., et al.: Security Evaluation of support vector machines in adversarial environments. In: Ma, Y., Guo, G. (eds.) Support Vector Machines Applications, pp. 105–153. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-02300-7_4Blanco-Vega, R., Hernández-Orallo, J., Ramírez-Quintana, M.J.: Analysing the trade-off between comprehensibility and accuracy in mimetic models. In: Suzuki, E., Arikawa, S. (eds.) DS 2004. LNCS (LNAI), vol. 3245, pp. 338–346. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30214-8_29Dalvi, N., Domingos, P., Sanghai, S., Verma, D., et al.: Adversarial classification. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 99–108. ACM (2004)Dheeru, D., Karra Taniskidou, E.: UCI machine learning repository (2017). http://archive.ics.uci.edu/mlDomingos, P.: Knowledge discovery via multiple models. Intell. Data Anal. 2(3), 187–202 (1998)Duin, R.P.W., Loog, M., Pȩkalska, E., Tax, D.M.J.: Feature-based dissimilarity space classification. In: Ünay, D., Çataltepe, Z., Aksoy, S. (eds.) ICPR 2010. LNCS, vol. 6388, pp. 46–55. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17711-8_5Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D.: Do we need hundreds of classifiers to solve real world classification problems. J. Mach. Learn. Res. 15(1), 3133–3181 (2014)Ferri, C., Hernández-Orallo, J., Modroiu, R.: An experimental comparison of performance measures for classification. Pattern Recognit. Lett. 30(1), 27–38 (2009)Giacinto, G., Perdisci, R., Del Rio, M., Roli, F.: Intrusion detection in computer networks by a modular ensemble of one-class classifiers. Inf. Fusion 9(1), 69–82 (2008)Huang, L., Joseph, A.D., Nelson, B., Rubinstein, B.I., Tygar, J.: Adversarial machine learning. In: Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence, pp. 43–58 (2011)Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn. 51(2), 181–207 (2003)Landis, J.R., Koch, G.G.: An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics 33, 363–374 (1977)Lowd, D., Meek, C.: Adversarial learning. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data mining, pp. 641–647. ACM (2005)Martınez-Plumed, F., Prudêncio, R.B., Martınez-Usó, A., Hernández-Orallo, J.: Making sense of item response theory in machine learning. In: Proceedings of 22nd European Conference on Artificial Intelligence (ECAI). Frontiers in Artificial Intelligence and Applications, vol. 285, pp. 1140–1148 (2016)Papernot, N., McDaniel, P., Goodfellow, I.: Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277 (2016)Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. In: 2016 IEEE European Symposium on Security and Privacy (EuroS&P), pp. 372–387. IEEE (2016)Papernot, N., McDaniel, P., Wu, X., Jha, S., Swami, A.: Distillation as a defense to adversarial perturbations against deep neural networks. In: 2016 IEEE Symposium on Security and Privacy (SP), pp. 582–597. IEEE (2016)Sesmero, M.P., Ledezma, A.I., Sanchis, A.: Generating ensembles of heterogeneous classifiers using stacked generalization. Wiley Interdiscip. Rev.: Data Min. Knowl. Discov. 5(1), 21–34 (2015)Smith, M.R., Martinez, T., Giraud-Carrier, C.: An instance level analysis of data complexity. Mach. Learn. 95(2), 225–256 (2014)Tramèr, F., Zhang, F., Juels, A., Reiter, M.K., Ristenpart, T.: Stealing machine learning models via prediction APIs. In: USENIX Security Symposium, pp. 601–618 (2016)Valiant, L.G.: A theory of the learnable. Commun. ACM 27(11), 1134–1142 (1984)Wallace, C.S., Boulton, D.M.: An information measure for classification. Comput. J. 11(2), 185–194 (1968)Wolpert, D.H.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992

Crossref

RiuNet

Prototype generation on structural data using dissimilarity space representation

Author: C Decaestecker
C Higuera de la
CF Tsai
DL Wilson
E Pekalska
EZ Borzeshi
F Angiulli
F Fernández
G Hjaltason
H Bunke
I Triguero
J Abreu
J Calvo-Zaragoza
J Calvo-Zaragoza
J Demsar
J Hull
J Sánchez
Jorge Calvo-Zaragoza
Jose J. Valero-Mas
JR Cano
JR Rico-Juan
Juan R. Rico-Juan
L Nanni
N García-Pedrajas
P Hart
RA Wagner
RO Duda
RPW Duin
S Garcia
S García
SB Kotsiantis
TM Mitchell
Y LeCun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Data reduction techniques play a key role in instance-based classification to lower the amount of data to be processed. Among the different existing approaches, prototype selection (PS) and prototype generation (PG) are the most representative ones. These two families differ in the way the reduced set is obtained from the initial one: While the former aims at selecting the most representative elements from the set, the latter creates new data out of it. Although PG is considered to delimit more efficiently decision boundaries, the operations required are not so well defined in scenarios involving structural data such as strings, trees, or graphs. This work studies the possibility of using dissimilarity space (DS) methods as an intermediate process for mapping the initial structural representation to a statistical one, thereby allowing the use of PG methods. A comparative experiment over string data is carried out in which our proposal is faced to PS methods on the original space. Results show that the proposed strategy is able to achieve significantly similar results to PS in the initial space, thus standing as a clear alternative to the classic approach, with some additional advantages derived from the DS representation.This work was partially supported by the Spanish Ministerio de Educación, Cultura y Deporte through a FPU fellowship (AP2012–0939), Vicerrectorado de Investigación, Desarrollo e Innovación de la Universidad de Alicante through FPU program (UAFPU2014–5883), and the Spanish Ministerio de Economía y Competitividad through Project TIMuL (No. TIN2013-48152-C2-1-R supported by EU FEDER funds)

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Athena: A Macintosh-Based Interactive Karyotyping System

Author: AM Vossepoel
APDA - Apple Programmer’s Development Association
BH Mayall
BH Mayall
BH Mayall
BH Mayall
C Lundsteen
E Granum
FCA Groen
FCA Groen
FCA Groen
G Gallus
GA Zee van
GH Granlund
GW Zack
H Tamura
IT Young
IT Young
J Piper
JH Tjio
JN Lucas
K Preston
LJ Vliet
LJ Vliet van
M Ploeg van der
ML Mendelsohn
ML Mendelsohn
ML Mendelsohn
ML Mendelsohn
MM Keizer de
MM Keizer de
NG Deriugen
PW Neurath
RO Duda
RP Bishop
RPW Duin
RS Ledley
RS Ledley
RT Visser
S Krusemark
T Caspersson
T Caspersson
TKT Kate
TKT Kate
TW Ridler
WF Schreiber
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1989
Field of study

Crossref

What variables are important in predicting bovine viral diarrhea virus? A random forest approach

Author: A Liaw
A Olinsky
AL Lindberg
AM Prasad
B Efron
B Larison
B Slabbinck
C Kampichler
C Luzzago
C Xi
C Xiong
CA Drew
D Meyer
DJ Holtkamp
DJ Paton
EA Altamiranda
GH Nguyen
GM Benito
GS Silva
Gustavo Machado
H He
H Houe
HM Gunn
J Niza-Ribeiro
J Peters
J Wu
JA Gard
JH Friedman
JO Ogutu
JR Lang-Ree
L Barco
L Breiman
L Breiman
L Breiman
LL Almeida
Luis Gustavo Corbellini
Mariana Recamonde Mendoza
NP Chaves
P Domingos
P Jiang
P Kirkland
P Presi
P Simmonds
PG Hermans
PS Valle
R Casanova
R Niskanen
R Pino-Mejías
RC Mainar-Jaime
RD Cutler
RM Mendoza
RPW Duin
RW Humphry
S Sarrazin
SL Rodrigo
SM Goyal
T Gerds
T Hastie
TS Lim
V Svetnik
V Vapnik
WS Noble
Y Mansiaux
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Effective diagnosis of Alzheimer’s disease by means of large margin-based methodology

Crossref

A new approach for evaluating risk factors in coronary artery disease: a study of lipid concentrations and severity of disease in 1847 males.

Author: Duin RPW
Sones
Stamler J
Publication venue: 'Ovid Technologies (Wolters Kluwer Health)'
Publication date
Field of study

Crossref

A note on core research issues for statistical pattern recognition RID F-3169-2010

Author: de Ridder D
Duin RPW
Roli F
Publication venue: 'Elsevier BV'
Publication date: 01/01/2002
Field of study

this paper aims to stimulate discussion in the pattern recognition community on the structural differences between statistical pattern recognition and closely related disciplines in order *Corresponding author. Tel.: 31-15-278-614

CiteSeerX

Archivio istituzionale della ricerca - Università di Cagliari

Archivio istituzionale della ricerca - Università di Genova

A Fast Approach to Improve Classification Performance of ECOC Classification Systems

Author: DUIN RPW
F. TORTORELLA
SIMEONE P
TAX DMJ
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Error correcting output coding is a well known technique to decompose a multi-class classification problem into a group of two-class problems which can be faced by using a combination of binary classifiers. Each of them is trained on a different dichotomy of the classes. The way the set of classes is mapped on this set of dichotomies may essentially influence the obtained performance. In this paper we present a new tool, the k-NN lookup table to optimize this mapping in a fast way and a fast procedure to change the dichotomies in a proper way. Experiments on artificial and public data sets show that the proposed procedure may significantly improve the ECOC performance in multi-class problems

IRIS Unicas (Università degli Studi di Cassino e del Lazio Meridionale)

Crossref

Archivio della Ricerca - Università di Salerno

Almost autonomous training of mixtures of principal component analyzers

Author: Atalay Mehmet Volkan
de Ridder D
Duin RPW
Musa MEM
Publication venue: 'Elsevier BV'
Publication date: 02/07/2004
Field of study

In recent years, a number of mixtures of local PCA models have been proposed. Most of these models require the user to set the number of submodels (local models) in the mixture and the dimensionality of the submodels (i.e., number of PC's) as well. To make the model free of these parameters, we propose a greedy expectation-maximization algorithm to find a suboptimal number of submodels. For a given retained variance ratio, the proposed algorithm estimates for each submodel the dimensionality that retains this given variability ratio. We test the proposed method on two different classification problems: handwritten digit recognition and 2-class ionosphere data classification. The results show that the proposed method has a good performance

OpenMETU (Middle East Technical University)